The Pennsylvania State University, Spring 2021 Stat 415-001, Hyebin Song

Hypothesis Testing

Hypothesis TestingIntroduction to Hypothesis TestingLearning objectivesHypothesis testing frameworkHypothesisTest statistic and rejection regionTwo types of testing errorsTest statistic and p-valueSteps to perform a hypothesis testing with the significance level $\alpha$ Duality of confidence intervals with hypothesis testsTests about one meanLearning objectiveSummaryTests about two meansLearning objectiveTwo independent samplesA paired sampleSummaryTests about proportions Learning objectiveOne sample (large n)One sample (exact)Two independent samples (large $n_X$ and $n_Y$ )Test about variancesLearning objectiveTest about variancesMore examples on calculating Type I and II error probabilities and power of a statistical test Learning objectiveBest Rejection (=Critical) Regions and Likelihood Ratio TestsLearning objectivesComposite hypothesis and uniformly most powerful level $\alpha$ test Simple null and composite alternative hypothesisLikelihood Ratio Tests (LRT)

Introduction to Hypothesis Testing

Learning objectives

Understand the hypothesis testing framework
Understand basic terminology in hypothesis testing framework

Hypothesis testing framework

Hypothesis

$\Theta$ $\Theta_0$ $\Theta_1$ $\Theta = \Theta_0 \cup \Theta_1$ .

$\Theta_0$ $\theta \in \Theta_0$ .

$H_0: \theta \in \Theta_0$ .
$H_1: \theta \notin \Theta_0$ $\theta \in \Theta_1$ ).

sufficient evidence $(x_1,\dots,x_n)$ reject $\theta\in\Theta_0 \rightarrow \theta \in \Theta_1$ do not reject $\theta$ ).

$(x_1,\dots,x_n)$ if the null hypothesis is true. We need to decide how unlikely the data has to be to reject the null hypothesis.

Test statistic and rejection region

We use a test statistic and an associated rejection region (= critical region) to determine whether we have sufficient evidence to disprove the null hypothesis. We reject the null hypothesis if the observed test statistic is in the rejection region.

Remark : the decision is random, because the decision is based on the observed value of a test statistic. If we collect another sample, we would get a different observed test statistic value, and we may make a different decision.

Example: $X$ $X$ $N(\mu,100)$ $\mu = 50$ $n=100$ $100$ $50.5$ $50.5$ $50$ .

$H_0: \mu = 50, \, H_1: \mu > 50$ .
$\bar{X}$ .
$\bar{x} = 50.5$ .
$\{y; \,\, y\ge 50\} = [50, \infty)$ .
$\bar{x}$ is in the rejection region, the company rejected the null hypothesis.

Remark 1: $H_1$ . If

$H_1: \theta > \theta_0$ $\{y;y \ge k\}$ .
$H_1: \theta < \theta_0$ $\{y;y \le k\}$ .
$H_1: \theta \ne \theta_0$ $\{y;y \le k_1 \mbox{ or } y\ge k_2\}$ .

Remark: $k$ $k$ ?

Two types of testing errors

For any fixed rejection region, two types of errors can be made in reaching a decision.

Type 1 Error: $H_0$ $H_0$ is true
Type 2 Error: $H_0$ $H_1$ is true

The probability of a Type I error,

P_{\theta; \theta \in \Theta_0}({\rm Test \, statistic }\in {\rm Rejection \, Region})

, is called the significance level of the test.

Example: $51.645$ ?

$[50,\infty)$ )
$\alpha =$ $H_0$ $H_0$ $P_{\mu=50}(\bar{X} \ge 50) = 0.5$
$\bar{X} \sim N(50, \frac{100}{100})$
$[51.645, \infty)$ )
$\alpha =$ $H_0$ $H_0$ $P_{\mu=50}(\bar{X} \ge 51.645) = P_{\mu=50}(\frac{\bar{X}-50}{1} \ge 51.645-50) = P(Z \ge 1.645) = 0.05$

Remark $\alpha$ $\alpha = 0.05$ $\alpha$ .

Test statistic and p-value

So far, we have discussed how to determine whether we have sufficient evidence to reject the null hypothesis. The degree of sufficiency was determined by the significance level of the test.

$\alpha$ $\alpha$ $\alpha$ .
$\alpha$ , we have stronger evidence against the null hypothesis.

$H_0: \mu = 50, \, H_1: \mu > 50$ ),

$\bar{x} = 50.5$ $\alpha=0.5$ $\alpha = 0.05$ .
$\bar{x} = 52$ $\alpha=0.5$ $\alpha = 0.05$ .
$\bar{x} = 51.645$ $\alpha=0.5$ $\alpha = 0.05$ .

p-value $\alpha$ of the test with which the null hypothesis can be rejected with the observed data.

$H_0: \mu = 50, \, H_1: \mu > 50$ ),

$\bar{x} = 50.5$ $[50.5, \infty)$ .
$\bar{x} = 52$ $[52, \infty)$ .
$\bar{x} = 51.645$ $[51.645, \infty)$ .

Therefore, p-value is the probability under the null hypothesis of obtaining a test statistic as extreme as the test statistic actually observed.

Remark 1: The smaller the p-value becomes, the more compelling is the evidence that the null hypothesis should be rejected.

Remark 2: $\alpha$ $\alpha$ test.

Example: $\bar{x} = 50.5$ $H_0: \mu = 50, \, H_1: \mu > 50$ $\alpha = 0.05$ .

$[50.5, \infty)$
$P_{\mu=50}(\bar{X} \ge 50.5) = P(Z \ge 0.5) = 0.3085>0.05$
$\alpha = 0.05$ .

$\alpha$

Write the null and alternative hypothesis.

$H_0:\theta=\theta_0$ $\theta_0$ is a fixed, known number)
$H_1$ :
- $H_1: \theta>\theta_0$ $\theta <\theta_0$
- $H_1:\theta \neq \theta_0$

$\widehat{\theta}$ $\theta$ $T$ $\hat{\theta}$ .
- $\hat{\theta}$ can be chosen as the test statistic.
- $\widehat{\theta}$ $\theta_0$ $\theta_0$ . Usually, it is in the form of
  $\frac{\hat{\theta}-\theta}{\sqrt{Var(\hat{\theta})}}$
  - $\sqrt{Var(\hat{\theta})}$ $\hat{\theta}$
$T$ $\theta=\theta_0$ ).
Make a decision based on the observed value of the test statistic.
1. $P_{\theta=\theta_0}(T \in {\rm Rejection \, Region}) = \alpha$ . Reject the null hypothesis if the observed test statistic is in the rejection region. or,
2. $T$ $\alpha$ .

Duality of confidence intervals with hypothesis tests

plausible values $\theta$ $\theta_0$ $H_0:\theta=\theta_0$ .

$H_0:\theta=\theta_0$ $H_1:\theta\ne\theta_0$ $\alpha$ if and only if $\theta_0$ $1-\alpha$ confidence interval. The decision rule is to

$\theta_0$ $1-\alpha$ $\alpha$ test.

Example: $95$ $\mu$ $N(\mu,100)$ $n=100$ $\bar{x} = 50.5$ is different from $\mu=50$ $\alpha = 0.05$ .

$H_0:\mu = 50,\,\, H_1:\mu\ne 50$ .
$\bar{x} \pm (1.96)\frac{10}{10} = 50.5 \pm 1.96 = (48.54, 52.46)$
$\alpha = 0.05$ .

The same duality principle holds for the one-sided hypotheses. For example, for the one-sided hypothesis

$H_0:\theta=\theta_0, \,\,H_1: \theta<\theta_0$ $\theta_0$ one-sided $(-\infty, U(X_1,\dots,X_n)]$ $1-\alpha$ $P_\theta(\theta \le U(X_1,\dots,X_n)) = 1-\alpha$ $1-\alpha$ $(-\infty, u(x_1,\dots,x_n)]$ . The decision rule is,

$H_1: \theta<\theta_0$ $u(x_1,\dots,x_n)<\theta_0$ .
$H_1:\theta>\theta_0$ $l(x_1,\dots,x_n)>\theta_0$ .

$\mu$ $[\hat{\theta}- {\rm coeff}(\alpha/2) \sqrt{Var(\hat{\theta})},\hat{\theta}+ {\rm coeff}(\alpha/2) \sqrt{Var(\hat{\theta})}]$
$[-\infty, \hat{\theta}+ {\rm coeff}(\alpha/2) \sqrt{Var(\hat{\theta})}]$
$[\hat{\theta}- {\rm coeff}(\alpha/2) \sqrt{Var(\hat{\theta})},\infty]$

Example: $95$ $\mu$ $N(\mu,100)$ $n=100$ $\bar{x} = 50.5$ has increased $\mu=50$ $\alpha = 0.05$ .

$H_0:\mu=50$ $H_1: \mu >50$ .
$[\bar{x} - z_{\alpha/2}, \infty] = [50.5-1.645,\infty) = [48.355,\infty)$
$48.355<50.5$

Tests about one mean

Learning objective

Understand how to perform statistical hypothesis tests using three methods (rejection region, p-value, confidence interval) regarding the population mean

Setting

$(X_1,\dots,X_n)$ $\mu = E[X_i]$ .

Null and alternative hypothesis:

$H_0: \mu = \mu_0$

$H_1: \mu>\mu_0$ $\mu <\mu_0$ $\mu \neq \mu_0$

$X_1, \ldots, X_n\sim N(\mu, \sigma^2)$ $\sigma^2$ known.

Hypothesis
1. $H_0: \mu=\mu_0.$ $H_1: \mu<\mu_0$ .
2. $H_0: \mu=\mu_0.$ $H_1: \mu> \mu_0$ .
3. $H_0: \mu=\mu_0.$ $H_1: \mu\ne \mu_0$ .
$\bar{X}$ $Z$ :
$Z = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}$
$Z$ .
- $\mu=\mu_0$ $X_i \sim N(\mu_0,\sigma^2)$ $\bar{X} \sim N(\mu_0, \sigma^2/n)$ $Z\sim N(0,1)$ .

Make a decision based on the observed value of the test statistic.
$z_{obs} = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}$

$P_{\mu=\mu_0}(Z \in {\rm Rejection \, Region}) = \alpha$
- For three types of alternative hypotheses, we can choose rejection regions as
  - $H_1: \mu< \mu_0$ $\{z_{obs}; z_{obs}\le -z_{\alpha}\}$
  - $H_1: \mu> \mu_0$ $\{z_{obs}; z_{obs}\ge z_{\alpha}\}$ .
  - $H_1: \mu\neq\mu_0$ $\{z_{obs}; |z_{obs}|\ge z_{\alpha/2} \}$ .
- $z_{obs}$ falls into the rejection region.

Method 2: compute the p-value.
- $H_1: \mu< \mu_0$ $(-\infty, z_{obs}] = P(Z\le z_{obs})$
- $H_1: \mu> \mu_0$ $[z_{obs},\infty) = P(Z\ge z_{obs})$
- $H_1: \mu\neq\mu_0$ $(-\infty, -|z_{obs}|]\cup [|z_{obs}|, \infty) = 2P(Z\ge |z_{obs}|)$
$1-\alpha$ confidence interval.
- $H_1: \mu< \mu_0$ $1-\alpha$ $(-\infty,\bar{x}+z_{\alpha} \frac{\sigma}{\sqrt{n}}]$ $u(x_1,\dots,x_n) =\bar{x}+z_{\alpha} \frac{\sigma}{\sqrt{n}}< \mu_0$ .
- $H_1: \mu> \mu_0$ $1-\alpha$ $[\bar{x}-z_{\alpha} \frac{\sigma}{\sqrt{n}}, \infty)$ $l(x_1,\dots,x_n) =\bar{x}-z_{\alpha} \frac{\sigma}{\sqrt{n}}> \mu_0$ .
- $H_1: \mu\neq\mu_0$ $1-\alpha$ $[\bar{x}-z_{\alpha/2} \frac{\sigma}{\sqrt{n}},\bar{x}+z_{\alpha/2} \frac{\sigma}{\sqrt{n}}]$ $u(x_1,\dots,x_n) < \mu_0$ $l(x_1,\dots,x_n)>\mu_0$ .

Example: $X$ $X$ $N(\mu, 1296)$ $1460$ $n = 27$ $\bar{x} = 1478$ $\alpha=0.05$ .

Hypothesis
$H_0: \mu=\mu_0.$ $H_1: \mu\ne 1460$ .
$\bar{X}$ $Z$
$Z = \frac{\bar{X}-1460}{36/\sqrt{27}}$
$\mu=\mu_0$ $X_i \sim N(1460, 1296)$ $\bar{X} \sim N(1460, 1296/27)$ $Z\sim N(0,1)$ .
$z_{obs}$ is
$z_{obs} = \frac{1478-1460}{36/\sqrt{27}} = 2.598$
$\{z; |z|\ge z_{\alpha/2} = z_{0.025} = 1.96\}$ $2.598>1.96$ $\alpha = 0.05$ . OR,
$P(|Z| \ge 2.598) = 2(0.0047)=0.0094 <0.05$ $0.05$ $\alpha = 0.05$ . OR,
$(1-0.05)$ $1478 \pm (1.96)\frac{36}{\sqrt{27}} = [1464.4, 1491.6]$ $\mu_0 = 1460$ $\alpha= 0.05$ .

$X_1, \ldots, X_n\sim N(\mu, \sigma^2)$ $\sigma^2$ unknown.

Hypothesis
1. $H_0: \mu=\mu_0.$ $H_1: \mu<\mu_0$ .
2. $H_0: \mu=\mu_0.$ $H_1: \mu> \mu_0$ .
3. $H_0: \mu=\mu_0.$ $H_1: \mu\ne \mu_0$ .
$\bar{X}$ $Z = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}$ $\sigma$ is unknown). We consider
$T = \frac{\bar{X}-\mu_0}{S/\sqrt{n}}$
$T$ .
- $\mu=\mu_0$ $X_i \sim N(\mu_0,\sigma^2)$ $T$ $t$ $n-1$ .
Lemma: $X_i\sim N(\mu,\sigma^2)$ $\frac{\bar{X}-\mu} {S/\sqrt{n}} \sim t(n-1)$ $t$ $n-1$ .
Make a decision based on the observed value of the test statistic.
$t_{obs} = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}$

$P_{\mu=\mu_0}(T \in {\rm Rejection \, Region}) = \alpha$
- For three types of alternative hypotheses, we can choose rejection regions as
  - $H_1: \mu< \mu_0$ $\{t_{obs}; z_{obs}\le -t_{\alpha}(n-1)\}$
  - $H_1: \mu> \mu_0$ $\{t_{obs}; t_{obs}\ge t_{\alpha}(n-1)\}$ .
  - $H_1: \mu\neq\mu_0$ $\{t_{obs}; |t_{obs}|\ge t_{\alpha/2}(n-1) \}$ .
- $t_{obs}$ falls into the rejection region.

Method 2: compute the p-value.
- $H_1: \mu< \mu_0$ $(-\infty, t_{obs}] = P(T\le t_{obs})$
- $H_1: \mu> \mu_0$ $[t_{obs},\infty) = P(T\ge t_{obs})$
- $H_1: \mu\neq\mu_0$ $(-\infty, -|t_{obs}|]\cup [|t_{obs}|, \infty) = 2P(T\ge |t_{obs}|)$

$1-\alpha$ confidence interval.
- $H_1: \mu< \mu_0$ $1-\alpha$ $(-\infty,\bar{x}+t_{\alpha}(n-1) \frac{s}{\sqrt{n}}]$ $u(x_1,\dots,x_n) =\bar{x}+t_{\alpha}(n-1) \frac{s}{\sqrt{n}}< \mu_0$ .
- $H_1: \mu> \mu_0$ $1-\alpha$ $[\bar{x}-t_{\alpha}(n-1) \frac{\sigma}{\sqrt{n}}, \infty)$ $l(x_1,\dots,x_n) =\bar{x}-t_{\alpha}(n-1) \frac{s}{\sqrt{n}}> \mu_0$ .
- $H_1: \mu\neq\mu_0$ $1-\alpha$ $[\bar{x}-t_{\alpha/2}(n-1) \frac{s}{\sqrt{n}},\bar{x}+t_{\alpha/2}(n-1) \frac{s}{\sqrt{n}}]$ $u(x_1,\dots,x_n) < \mu_0$ $l(x_1,\dots,x_n)>\mu_0$ .

Example: $\mu$ $500$ $25$ $\bar{x} = 308.8$ $s = 115.15$ $\alpha=0.01$ .

$H_0: \mu = 500$ $H_1: \mu <500$ .
$T = \frac{\bar{X} - 500}{S/\sqrt{25}}$
$T$ $t(24)$ .
$(308-500)/(115.15/\sqrt{25}) = -8.30$
$t_{obs} < -t_{0.01}(24) = -2.492$
$P_{\mu= 500}(T<t_{obs}) = P_{\mu= 500}(T<-8.30) \approx 0$ .
$(-\infty, \bar{x} + t_{0.01}(24) \frac{s}{\sqrt{25}}] = (-\infty, 308.8 + 2.492(11.525/\sqrt{25})] = (-\infty, 306.191]$

$X_1, \ldots, X_n$ $n$

Hypothesis
1. $H_0: \mu=\mu_0.$ $H_1: \mu<\mu_0$ .
2. $H_0: \mu=\mu_0.$ $H_1: \mu> \mu_0$ .
3. $H_0: \mu=\mu_0.$ $H_1: \mu\ne \mu_0$ .
$\bar{X}$ is a good estimator. We consider
$Z = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\,\, \mbox{ or }\,\,Z = \frac{\bar{X}-\mu_0}{S/\sqrt{n}}$
$Z$ under the null.
- $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \dot\sim N(0,1)$ .
- $\frac{\bar{X}-\mu}{S/\sqrt{n}} \dot\sim N(0,1)$
- Under the null hypothesis both test statistic follows a Normal distribution.
Make a decision based on the observed value of the test statistic.
$z_{obs} = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}\,\,\mbox{or}\,\, \frac{\bar{x}-\mu_0}{s/\sqrt{n}}$

Summary

$(X_1,\dots,X_n)$ ,

Settings	Test statistic
$X_i \sim N(\mu,\sigma^2)$ $\sigma^2$ $H_0 : \mu=\mu_0$ .	$Z = \frac{\bar{X}-\mu_0}{\sqrt{\sigma^2/n}} \overset{H_0}{\sim} N(0,1)$
$X_i \sim N(\mu,\sigma^2)$ $\sigma^2$ $H_0 : \mu=\mu_0$ .	$T = \frac{\bar{X}-\mu_0}{\sqrt{S^2/n}} \overset{H_0}{\sim} t(n-1)$
$X_i$ $\mu = E[X_i]; H_0 : \mu=\mu_0$ .	$Z = \frac{\bar{X}-\mu_0}{\sqrt{\sigma^2/n}} \overset{H_0}{\dot\sim} N(0,1)$ $Z = \frac{\bar{X}-\mu_0}{\sqrt{S^2/n}} \overset{H_0}{\dot\sim} N(0,1)$

$S^2$ is a sample variance estimator.

Tests about two means

Learning objective

Understand how to perform statistical hypothesis tests using three methods (rejection region, p-value, confidence interval) regarding the difference of population means

Setting

$(X_1,\dots, X_{n_X})$ $(Y_1,\dots,Y_{n_Y})$ $\mu_X-\mu_Y = E[X_i] - E[Y_i]$ .

Null and alternative hypothesis:

$H_0: \mu_X = \mu_Y$

$H_1: \mu_X>\mu_Y$ $\mu_X <\mu_Y$ $\mu_X \neq \mu_Y$

Two independent samples

$X_1, \ldots, X_n\sim N(\mu_X, \sigma_X^2)$ $Y_1, \ldots, Y_n\sim N(\mu_Y, \sigma_Y^2)$ $\sigma_X^2, \sigma_Y^2$ known.

Hypothesis
1. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X<\mu_Y$ .
2. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X>\mu_Y$ .
3. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X\ne\mu_Y$ .
$\bar{X}-\bar{Y}$ $\mu_X-\mu_Y$ $Z$ :
$Z = \frac{\bar{X}-\bar{Y}-0}{\sqrt{\sigma_X^2/n_X + \sigma_Y^2/n_Y}}$
$Z$ .
- $\bar{X}-\bar{Y} \sim N(\mu_X-\mu_Y, \frac{\sigma_X^2}{n_X} + \frac{\sigma_Y^2}{n_Y})$ $\mu_X=\mu_Y$ ,
  $\bar{X}-\bar{Y} \sim N(0, \frac{\sigma_X^2}{n_X} + \frac{\sigma_Y^2}{n_Y})$ $Z\sim N(0,1)$ .

Make a decision based on the observed value of the test statistic.
$z_{obs} = \frac{\bar{x}-\bar{y}-0}{\sqrt{\frac{\sigma_X^2}{n_X} + \frac{\sigma_Y^2}{n_Y}}}$

$\alpha$ ,
- $\alpha$
- find a p-value
- $1-\alpha$ $\mu_X-\mu_Y$

Example: The amount of a certain trace element in blood is known to be normally distributed and vary with a standard deviation of 5 ppm (parts per million) for female donors and 10 ppm for male blood donors. Random samples of 25 female and 25 male donors yield concentration means of 33 and 28 ppm, respectively. A doctor wants to know whether the population means of concentrations of the element are higher for women.

Hypothesis
$H_0: \mu_W=\mu_M.$ $H_1: \mu_W > \mu_M$ .
$\bar{X}- \bar{Y}$ $\mu_W - \mu_M$ $Z$
$Z = \frac{\bar{X}-\bar{Y}-0 }{\sqrt{5^2/25+10^2/25}}$
$Z\sim N(0,1)$ .
$z_{obs}$ is
$z_{obs} = \frac{33-28}{\sqrt{1+4}} = 2.236$
$\{z; z\ge z_{\alpha} = z_{0.05} = 1.645\}$ $2.236>1.645$ $\alpha = 0.05$ . OR,
$P(Z \ge 2.236) = 0.0127 <0.05$ $0.05$ $\alpha = 0.05$ . OR,
$(1-0.05)$ $\mu_W-\mu_M$ is
$[(\bar{x}-\bar{y}) - z_{0.05} \sqrt{\frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}},\infty) = [5-(1.645)\sqrt{1+4},\infty) = [1.322,\infty)$
$0$ $\alpha= 0.05$ .

$X_1, \ldots, X_n\sim N(\mu_X, \sigma_X^2)$ $Y_1, \ldots, Y_n\sim N(\mu_Y, \sigma_Y^2)$ with unknown variances

Hypothesis
1. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X<\mu_Y$ .
2. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X>\mu_Y$ .
3. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X\ne\mu_Y$ .
$\bar{X}-\bar{Y}$ $\mu_X-\mu_Y$ . The test statistic in the previous setting,
$Z = \frac{\bar{X}-\bar{Y}-0}{\sqrt{\sigma_X^2/n_X + \sigma_Y^2/n_Y}}$
$\sigma^2 = \sigma_X^2 = \sigma_Y^2$ are unknown.

$\sigma^2 = \sigma_X^2= \sigma_Y^2$ unknown
$\sigma^2$ $S_p^2$ $\sigma^2$ $S_p^2 = \frac{\sum_{i=1}^{n_X} (X_i-\bar{X})^2 + \sum_{i=1}^{n_Y} (Y_i-\bar{Y})^2}{n_X-1+n_Y-1}$ . We consider the following test statistic
$T = \frac{\bar{X}-\bar{Y}-0}{\sqrt{ \frac{S_p^2}{n_X}+\frac{S_p^2}{n_Y}}}.$
$\sigma_X^2 \ne \sigma_Y^2$ unknown, we use the following test statistic instead
$T = \frac{\bar{X}-\bar{Y}-0}{\sqrt{ \frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y}}}.$
$T$ .
$\sigma^2 = \sigma_X^2= \sigma_Y^2$ unknown
- $T = \frac{\bar{X}-\bar{Y}-(\mu_X - \mu_Y)}{\sqrt{ \frac{S_p^2}{n_X}+\frac{S_p^2}{n_Y}}} \sim t(n_X+n_Y-2)$ $\mu_X=\mu_Y$ $T \sim t(n_X+n_Y-2)$
$\sigma_X^2 \ne \sigma_Y^2$ unknown
- We use Welch's approximation and obtain
  $T = \frac{\bar{X}-\bar{Y}-(\mu_X - \mu_Y)}{\sqrt{ \frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y}}} \sim t(r)$

Make a decision based on the observed value of the test statistic.

Example: The amount of a certain trace element in human blood is known to be normally distributed. Also, it is known that the variances of this trace element are the same between men and women. Random samples of 25 female and 25 male donors yield concentration means of 33 and 28 ppm, respectively, with a standard deviation of 5 ppm for female donors and 10 ppm for male blood donors. A doctor wants to know whether the population means of concentrations of the element are higher for women.

Hypothesis
$H_0: \mu_W=\mu_M.$ $H_1: \mu_W > \mu_M$ .
$T$
$T = \frac{\bar{X}-\bar{Y}-0 }{\sqrt{S_p^2/25+S_p^2/25}}$
$T\sim t(48)$ .
$t_{obs}$ is
$z_{obs} = \frac{33-28}{\sqrt{s_p^2/25+s_p^2/25}} = \frac{5}{\sqrt{2(62.5)/25}} = 2.236$
$s_p^2 = \frac{\sum_{i=1}^{n_X} (x_i-\bar{x})^2 + \sum_{i=1}^{n_Y} (y_i-\bar{y})^2}{n_X-1+n_Y-1}$ $s_W^2 = \frac{1}{n_X-1}\sum_{i=1}^{n_X} (x_i-\bar{x})^2 = 5^2$ $s_M^2 = \frac{1}{n_Y-1}\sum_{i=1}^{n_Y} (y_i-\bar{y})^2 = 10^2$ $s_p^2 = \frac{\sum_{i=1}^{n_X} (x_i-\bar{x})^2 + \sum_{i=1}^{n_Y} (y_i-\bar{y})^2}{n_X-1+n_Y-1} = \frac{25(24)+(100)(24)}{48} = 62.5$ .
$\{t; t\ge t_{0.05}(48) = 1.67\}$ $2.23>1.67$ $\alpha = 0.05$ . OR,
$P(T \ge 2.236) = 0.015 <0.05$ $0.05$ $\alpha = 0.05$ . OR,
$(1-0.05)$ $\mu_W-\mu_M$ is
$[(\bar{x}-\bar{y}) - t_{0.05}(48) \sqrt{\frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y}},\infty) = [5-(1.67)\sqrt{(2\cdot 62.5)/25},\infty) = [1.266,\infty)$
$0$ $\alpha= 0.05$ .

$X_1, \ldots, X_n$ $Y_1, \ldots, Y_n$ , unknown distributions

Hypothesis
1. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X<\mu_Y$ .
2. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X>\mu_Y$ .
3. $H_0: \mu_X=\mu_Y.$ $H_1: \mu_X\ne\mu_Y$ .
$\bar{X}-\bar{Y}$ $\mu_X-\mu_Y$ $Z$ :
$Z = \frac{\bar{X}-\bar{Y}-0}{\sqrt{\sigma_X^2/n_X + \sigma_Y^2/n_Y}}$
when both variances are known or

Z = \frac{\bar{X}-\bar{Y}-0}{\sqrt{S_X^2/n_X + S_Y^2/n_Y}}

when both variances are unknown.

$Z$ .
From an application of a version of CLT, we have
$\frac{\bar{X}-\bar{Y}-(\mu_X - \mu_Y)}{\sqrt{ \frac{\sigma_X^2} {n_X}+\frac{\sigma_Y^2}{n_Y}}} \dot\sim N(0,1)\tag{2}$
and
$\frac{\bar{X}-\bar{Y}-(\mu_X - \mu_Y)}{\sqrt{ \frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y}}} \dot\sim N(0,1)\tag{3}$
$S_X^2 \approx \sigma_X^2$ $S_Y^2 \approx \sigma_Y^2$ $n_X$ $n_Y$ $\mu_X=\mu_Y$ $Z\dot\sim N(0,1)$ .

Make a decision based on the observed value of the test statistic.

z_{obs} = \frac{\bar{x}-\bar{y}-0}{\sqrt{\frac{\sigma_X^2}{n_X} + \frac{\sigma_Y^2}{n_Y}}} {\rm\,\, or \,\,} \frac{\bar{x}-\bar{y}-0}{\sqrt{\frac{s_X^2}{n_X} + \frac{s_Y^2}{n_Y}}}

A paired sample

$X_i$ $Y_i$ $\bar{X}$ $\bar{Y}$ $D_i = X_i-Y_i$ $X_i$ $Y_i$ $E[D_i] = \mu_D = \mu_X-\mu_Y$ . Therefore we can test

$H_0: \mu_X=\mu_Y.$ $H_1: \mu_X<\mu_Y$ $H_0: \mu_D=0.$ $H_1: \mu_D<0$
$H_0: \mu_X=\mu_Y.$ $H_1: \mu_X>\mu_Y$ $H_0: \mu_D=0.$ $H_1: \mu_D>0$
$H_0: \mu_X=\mu_Y.$ $H_1: \mu_X\ne\mu_Y$ $H_0: \mu_D=0.$ $H_1: \mu_D\ne0$

$D_i$ follows a normal distribution (or a large sample size), we can use the previous hypothesis testing procedure for one mean to test the hypotheses above.

Example A researcher wants to study whether lack of sleep impacts cognitive performance. The researcher recruited 10 participants. Each participant is asked to take the tests twice: one after a normal sleep and the other after being kept awake for 24 hours.

	1	2	3	4	5	6	7	8	9	10
First test (normal sleep)	8.1	9.5	7.2	11.6	9.9	7.3	10	10.7	10.4	8.5
Second test (awake for 24 hours)	7.0	8.6	6.3	10.7	8.8	6.3	8.9	9.1	9.0	7.5

Suppose it is reasonable to assume that the difference of test scores is normally distributed. The researcher wants to show that lack of sleep decreases cognitive performance.

Hypothesis
$H_0: \mu_N=\mu_L.$ $H_1: \mu_N > \mu_L$ .
$T$
$T = \frac{\bar{X}-\bar{Y}-0 }{\sqrt{S_D^2/10}}$
$T\sim t(9)$ .
$t_{obs}$ is
$t_{obs} = \frac{\bar{x}-\bar{y}}{s_D/\sqrt{10}} = 4.763$
$P(T \ge 4.763) = 0.0005 <0.05$ $0.05$ $\alpha = 0.05$ .

Summary

$(x_1,\dots,x_{n_X})$ $(y_1,\dots,y_{n_Y})$ $(X_1,\dots,X_{n_X})$ $(Y_1,\dots,Y_{n_Y})$

Settings	Test statistic
$X_i \sim N(\mu_X,\sigma_X^2)$ $Y_i\sim N(\mu_Y, \sigma_Y^2)$ $\sigma_X^2, \sigma_Y^2$ $H_0 : \mu_X-\mu_Y = 0$ .	$Z = \frac{(\bar{X}-\bar{Y}) - (0)}{\sqrt{\sigma_X^2/n_X+ \sigma_Y^2/n_Y}} \overset{H_0}{\sim} N(0,1)$
$X_i \sim N(\mu_X,\sigma_X^2)$ $Y_i\sim N(\mu_Y, \sigma_Y^2)$ $\sigma^2 = \sigma_X^2 = \sigma_Y^2$ $H_0 : \mu_X-\mu_Y = 0$ .	$T = \frac{(\bar{X}-\bar{Y}) - (0)}{\sqrt{\frac{S_p^2}{n_X}+\frac{S_p^2}{n_Y}}} \overset{H_0}{\sim} t(n_X+n_Y-2)$
$X_i \sim N(\mu_X,\sigma_X^2)$ $Y_i\sim N(\mu_Y, \sigma_Y^2)$ $\sigma_X^2 \ne \sigma_Y^2$ $H_0 : \mu_X-\mu_Y = 0$ .	$T = \frac{(\bar{X}-\bar{Y}) - (0)}{\sqrt{\frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y}}} \overset{H_0}{\sim} t(r)$ $r$ is the df from Welch's approximation
$n_X$ $n_Y$ $H_0 : \mu_X-\mu_Y = 0$ .	$Z = \frac{(\bar{X}-\bar{Y}) - (0)}{\sqrt{S_X^2/n_X+ S_Y^2/n_Y}} \overset{H_0}{\dot\sim} N(0,1)$
$n = n_X=n_Y$ $X_i - Y_i \sim N(\mu_X-\mu_Y,\sigma_D^2)$ $\sigma_D^2$ $H_0 : \mu_X-\mu_Y = 0$ .	$T = \frac{(\bar{X}-\bar{Y}) - (0)}{\sqrt{S_D^2/n}} \overset{H_0}{ \sim} t(n-1)$
$n = n_X=n_Y$ $n$ ;	$Z = \frac{\bar{D} - (0)}{\sqrt{S_D^2/n}} \overset{H_0}{ \sim} N(0,1)$

where

$S_p^2$ is a pooled sample variance estimator,
$S_X^2$ $(X_1,\dots,X_{n_X})$ ,
$S_Y^2$ $(Y_1,\dots,Y_{n_Y})$ ,
$S_D^2$ $(D_1,\dots,D_n)$ $D_i = X_i - Y_i$ .

Tests about proportions

Learning objective

Understand how to perform statistical hypothesis tests using three methods (rejection region, p-value, confidence interval) regarding the population proportion or difference of population proportions

One sample (large n)

$X_1,\dots,X_n \sim {\rm Ber}(p)$ $p$ .

Null and alternative hypothesis:
1. $H_0: p = p_0$
2. $H_1: p > p_0$ $p < p_0$ $p \ne p_0$ .
$\bar{X}$ $p$ $Z$ :
$Z=\frac{\bar{X}-p_0}{\sqrt{p_0(1-p_0)/n}}$
$X_i \sim {\rm Ber}(p)$ , we have,
$\frac{\bar{X}-p}{\sqrt{p(1-p)/n}} \dot\sim N(0,1).$
$p=p_0$ $H_0$ $Z \sim N(0,1)$ .
$Z$ $\alpha$ test are
- $z_{obs} >z_{\alpha}$ $z_{obs} < -z_{\alpha}$ $|z_{obs}| > z_{\alpha/2}$
- $P(Z\ge z_{obs})$ $P(Z \le z_{obs})$ $2P(Z \ge |z_{obs}|)$

Remark: In step 2, we may consider,

Z'=\frac{\bar{X}-p_0}{\sqrt{\bar{X}(1-\bar{X})/n}}

$Z'$ $Z'$ $Z$ $Z$ score test $Z'$ Wald test $\alpha$ significance tests.

Example: $p$ $H_0: p = 1/6$ $H_1: p>1/6$ $n=8000$ $Y$ $8000$ $y_{obs} = 1389$ .

$\alpha = .1$
$\alpha = .1$

Score test
$H_0: p = 1/6$ $H_1: p>1/6$
$Z=\frac{\bar{X}-1/6}{\sqrt{(1/6)(5/6)/8000}}$
$Z \sim N(0,1)$ .
$z_{obs } = \frac{(1389/8000)-1/6}{\sqrt{(1/6)(5/6)/8000}} = 1.670$
$\alpha=.1$ $z_{obs} \ge z_{\alpha} = 1.28$
Wald test
$H_0: p = 1/6$ $H_1: p>1/6$
$Z'=\frac{\bar{X}-1/6}{\sqrt{\bar{X}(1-\bar{X})/8000}}$
$Z' \sim N(0,1)$ .
$z'_{obs } = \frac{(1389/8000)-1/6}{\sqrt{(1389/8000)(1-(1389/8000))/8000}} = 1.6431$
$\alpha = .1$ $z'_{obs} \ge z_{\alpha} = 1.28$

One sample (exact)

$n$ $Z$ $Z'$ $N(0,1)$ $\bar{X}$ $X_i\sim {\rm Ber}(p)$ , is not equal to any common distribution that we usually work with. However, we know that the sum of i.i.d. Bernoulli random variables has a Binomial distribution.

Null and alternative hypothesis:
1. $H_0: p = p_0$
2. $H_1: p > p_0$ $p < p_0$ $p \ne p_0$ .
$Y$ :
$Y=\sum_{i=1}^n X_i$
$T$ $X_i\sim {\rm Ber}(p)$ $Y \sim {\rm Bin}(n,p)$ .
$p=p_0$ $H_0$ $Y \sim {\rm Bin}(n,p_0)$ .
$y_{obs}$ $Y$ .
- RR:
  - $H_1: p>p_0: y_{obs} \ge k$ $k$ $P(Y\ge k ) \le \alpha$ .
  - $H_1: p<p_0: y_{obs} \le k$ $k$ $P(Y\le k ) \le \alpha$ .
  - $H_1: p\ne p_0: y_{obs} \le k_1$ $y_{obs} \ge k_2$ $P(Y\le k_1 )+P(Y\ge k_2) \le \alpha$ .
- p-value
  - $P(Y\ge y_{obs})$ $P(Y\le y_{obs})$ $2\min\{P(Y\ge y_{obs}),P(Y\le y_{obs})\}$

Example: $2$ $6$ $p=.5$ $\alpha = 5$ $p$ is the proportion of people who believe that pineapple belongs on a pizza.

$H_0: p = 0.5$ $H_1: p \ne 0.5$
$Y = \sum_{i=1}^{6} X_i$ .
$Y \sim {\rm Bin}(6, 0.5)$ .
$y_{obs} = 2$ $y_{obs} \le k_1$ $y_{obs} \ge k_2$ $k_1$ $k_2$ $P(Y\leq k_1) + P(Y\geq k_2) \le \alpha$ $Y \sim {\rm Bin}(6, 0.5)$ .
k P(Y=k)
0 0.0156
1 0.0938
2 0.2344
3 0.3125
4 0.2344
5 0.0938
6 0.0156
$k_1 = 0$ $k_2=6$ $P(Y\le k_1) + P(Y\ge k_2) = 0.0156\cdot 2 = 0.0312$
$y_{obs} \le 0$ $y_{obs} \ge 6$ .
$y_{obs} = 2$ $\alpha = 5$ %.

$P(Y\le 2) + P(Y\ge 4) = 0.6875$
$\alpha$ $\alpha = 5$ %.
$50$ %.

k	P(Y=k)
0	0.0156
1	0.0938
2	0.2344
3	0.3125
4	0.2344
5	0.0938
6	0.0156

$n_X$ $n_Y$ )

$X_1, \ldots, X_{n_X}\sim {\rm Ber}(p_X)$ $Y_1, \ldots, Y_{n_Y}\sim {\rm Ber}(p_Y)$ $p_X-p_Y$ .

Hypothesis
1. $H_0: p_X=p_Y.$ $H_1: \mu_X<\mu_Y$ .
2. $H_0: p_X=p_Y.$ $H_1: p_X>p_Y$ .
3. $H_0: p_X=p_Y.$ $H_1:p_X \ne p_Y$ .
$\bar{X}-\bar{Y}$ $p_X-p_Y$ $p=p_X=p_Y$ $E[\bar{X}-\bar{Y}] = p_X-p_Y = 0$ ${\rm Var}(\bar{X}-\bar{Y}) = \frac{p_X(1-p_X)}{n_X} + \frac{p_Y(1-p_Y)}{n_Y}=\frac{p(1-p)}{n_X} + \frac{p(1-p)}{n_Y}$ $X_i, Y_i \sim {\rm Ber}(p)$ $p$ $\hat{p} = \frac{\sum_{i=1}^{n_X} X_i + \sum_{i=1}^{n_X} Y_i }{n_X+n_Y}$ .
$Z = \frac{\bar{X}-\bar{Y}-0}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_X} + \frac{\hat{p}(1-\hat{p})}{n_Y}}}$
By an application of CLT,
$\frac{\bar{X}-\bar{Y}-(p_X-p_Y)}{\sqrt{\frac{p_X(1-p_X)}{n_X} + \frac{p_Y(1-p_Y)}{n_Y}}} \dot\sim N(0,1)$

$p=p_X=p_Y$ $\hat{p} \approx p$ $n$ $Z \dot\sim N(0,1)$ .

$Z$ .

Remark 1: Similarly in one sample case, the test statistic

Z' = \frac{\bar{X}-\bar{Y}-0}{\sqrt{\frac{\bar{X}(1-\bar{X})}{n_X} + \frac{\bar{Y}(1-\bar{Y})}{n_Y}}}

can be alternatively used (Wald test).

Test about variances

Learning objective

Understand one sample and two sample variance tests

Test about variances

$X_1,\dots,X_n \sim N(\mu, \sigma^2)$ $\sigma^2$ .

Null and alternative hypothesis:
1. $H_0: \sigma^2 = c$
2. $H_1: \sigma^2 > c$ $\sigma^2 < c$ $\sigma^2 \ne c$ .
$S^2$ $\sigma^2$ $(S−c)/\sqrt{{\rm Var}(S)}$ $S$ is not a mean or sum of i.i.d random variables, CLT cannot be applied.
$S^2 = \frac{1}{n-1}\sum_{i=1}^n(X_i -\bar{X})^2$ . From HTZ Theorem 5.5-2, we have,
$\sum_{i=1}^n (\frac{X_i-\bar{X}}{\sigma})^2 \sim \chi^2(n-1)$
$(n-1)S^2 /\sigma^2 \sim \chi^2(n-1)$ .
We consider the test statistic
$W = \sum_{i=1}^n \frac{(X_i-\bar{X})^2}{c} = \frac{(n-1)S^2}{c}$
$W \sim \chi^2(n-1)$ .
$w_{obs}$ $W$ $\alpha$ test are
- Rejection Region:
  - $H_1: \sigma^2 > c$ $w_{obs} >\chi^2_{\alpha}(n-1)$
  - $H_1: \sigma^2 < c$ $w_{obs} < \chi^2_{1-\alpha}(n-1)$
  - $H_1: \sigma^2 \ne c$ $w_{obs} < \chi^2_{1-\alpha/2}(n-1)$ $w_{obs} > \chi^2_{\alpha/2}(n-1)$ .
- p-value:
  - $H_1: \sigma^2 > c$ $P(W > w_{obs})$
  - $H_1: \sigma^2 < c$ $P(W < w_{obs})$
  - $H_1: \sigma^2 \ne c$ $2\cdot\min\{P(W > w_{obs}),P(W < w_{obs})\}$

Several χ2 Distributions

$\chi^2_{\alpha}(r)$ can be obtained by


qchisq(p = 1-alpha,df = r)

$P(W \le w)$ $W \sim \chi^2(r)$ can be computed in R via


xxxxxxxxxx
pchisq(q = w, df = r)

Example: $0.6$ $0.6$ $n=23$ $23$ $0.42$ . Assume the amount of active ingredient in each pill follows a Normal distribution.

$H_0: \sigma^2 = 0.6^2$ $H_1: \sigma^2 <0.6^2$ .
$W = \sum_{i=1}^{23}(X_i-\bar{X})^2 / 0.6^2 = 22S^2/0.6^2$
$W$ $\chi^2(22)$ .
$22(0.42)^2/0.6^2 = 10.78$
$w_{obs} \le \chi^2_{1-\alpha}(22) = \chi^2_{0.95}(22) = 12.34$
$10.78 \le 12.34$ , we reject the null hypothesis.
$P(W\le w_{obs}) = P(W\le 10.78) = 0.022$ $W \sim \chi^2 (22)$ .
qchisq(p = 1-0.95,df = 22) # 12.338 pchisq(q = 10.78,df = 22) # 0.022

independent $X_1,\dots,X_{n_X} \sim N(\mu_X, \sigma_X^2)$ $Y_1,\dots,Y_{n_Y} \sim N(\mu_Y, \sigma_Y^2)$ $\sigma_X^2$ $\sigma_Y^2$ .

Null and alternative hypothesis:
1. $H_0: \sigma_X^2 = \sigma_Y^2$
2. $H_1: \sigma_X^2 > \sigma_Y^2$ $\sigma_X^2>\sigma_Y^2$ $\sigma_X^2 \ne \sigma_Y^2$ .
Remark: $H_0:\sigma_X^2 = k\sigma_Y^2$ $H_1: \sigma_X^2 >, \ne, < k\sigma_Y^2$ .
$S_X^2, S_Y^2$ $\sigma_X^2$ $\sigma_Y^2$ $(S_X−S_Y)/\sqrt{{\rm Var}(S_X -S_Y)}$ does not follow any common distribution, and also CLT cannot be applied.

We consider the test statistic

F = \frac{S_X^2}{S_Y^2}

$U \sim \chi^2(k_1)$ $V \sim \chi^2(k_2)$ $U,V$ are independent, then
$\frac{U/k_1}{V/k_2} \sim F(k_1,k_2)$
$k_1$ $k_2$ .

Several F Distributions

$\sum_{i=1}^{n_X} (\frac{X_i- \bar{X}}{\sigma_X})^2 \sim \chi^2(n_X-1)$ $\sum_{i=1}^{n_X} (\frac{Y_i- \bar{Y}}{\sigma_Y})^2\sim \chi^2(n_Y-1)$ . Thus,

\frac{\frac{1}{n_X-1}\sum_{i=1}^{n_X} (\frac{X_i- \bar{X}}{\sigma_X})^2 }{\frac{1}{n_Y-1}\sum_{i=1}^{n_Y} (\frac{Y_i- \bar{Y}}{\sigma_Y})^2 } = \frac{S_X^2/\sigma_X^2}{S_Y^2/\sigma_Y^2} \sim F(n_X-1,n_Y-1).

$F_{obs} = s_X^2/s_Y^2$ .
$\alpha$ test are
- Rejection Region:
  - $H_1: \sigma_X^2 > \sigma_Y^2$ $s_X^2/s_Y^2 > F_\alpha(n_X-1,n_Y-1)$
  - $H_1:\sigma_X^2 < \sigma_Y^2$ $s_X^2/s_Y^2 < F_{1-\alpha}(n_X-1,n_Y-1)$
  - $H_1:\sigma_X^2 \ne \sigma_Y^2$ $s_X^2/s_Y^2 > F_{\alpha/2}(n_X-1, n_Y-1)$ $s_X^2/s_Y^2 < F_{1-\alpha/2}(n_X-1, n_Y-1)$
  Remark: $F_{1-\alpha} ( n_X-1, n_Y-1) = \frac{1}{F_{\alpha} ( n_Y-1, n_X-1) }$ $F_{1-\alpha} ( n_X-1, n_Y-1)$ is the number such that
  $P(F \ge F_{1-\alpha} ( n_X-1, n_Y-1)) = 1-\alpha$
  $F \sim F(n_X-1,n_Y-1)$ $F = (U/(n_X-1))/(V/(n_Y-1))$ $U \sim \chi^2(n_X-1), V \sim \chi^2(n_Y-1)$ $U,V$ $1/F = (V/(n_Y-1))/(U/(n_X-1)) \sim F(n_Y-1,n_X-1)$ . Therefore,
  $P(F \le F_{1-\alpha} ( n_X-1, n_Y-1))=P(1/F \ge 1/F_{1-\alpha} ( n_X-1, n_Y-1))=\alpha$
  $F_{\alpha}(n_Y-1,n_X-1) = 1/F_{1-\alpha}(n_X-1,n_Y-1)$ .
  Thus, all cases can be written in terms of right-tail critical regions, and we have, rejection regions of
  - $H_1: \sigma_X^2 > \sigma_Y^2$ $s_X^2/s_Y^2 > F_\alpha(n_X-1,n_Y-1)$
  - $H_1:\sigma_X^2 < \sigma_Y^2$ $s_Y^2/s_X^2 > F_{\alpha}(n_Y-1,n_X-1)$
  - $H_1:\sigma_X^2 \ne \sigma_Y^2$ $s_X^2/s_Y^2 > F_{\alpha/2}(n_X-1, n_Y-1)$ $s_Y^2/s_X^2 > F_{\alpha/2}(n_Y-1, n_X-1)$
  Similarly, for p-values,
  - $H_1: \sigma_X^2 > \sigma_Y^2$ $P(F > s_X^2/s_Y^2)$ $F \sim F(n_X-1,n_Y-1)$

$H_1:\sigma_X^2 < \sigma_Y^2$ $P(F < s_X^2/s_Y^2) = P(F' > s_Y^2/s_X^2)$ $F' \sim F(n_Y-1 , n_X-1)$
- $H_1:\sigma_X^2 \ne \sigma_Y^2$ $2\cdot\min\{P(F > s_X^2/s_Y^2),P(F' > s_Y^2/s_X^2)\}$ $F \sim F(n_X-1,n_Y-1)$ $F' \sim F(n_Y-1 , n_X-1)$ .

Several F Distributions

Example $X$ $N(\mu_X, \sigma_X^2)$ $N(\mu_Y, \sigma_Y^2)$ $X$ $Y$ $H_0: \sigma_X^2 = \sigma_Y^2$ $H_1: \sigma_X^2<\sigma_Y^2$ $n_X = 30$ $X$ $\bar{x} = 5.917$ $s_X^2 = 0.4399$ $n_Y = 35$ $Y$ $\bar{y} = 8.153$ $s_Y^2= 1.4100$ . Use the significance level of 5%.

$H_0: \sigma_X^2 = \sigma_Y^2$ $H_1: \sigma_X^2<\sigma_Y^2$ .
$F = S_X^2/S_Y^2$ .
$F$ $F(29,34)$ .
$F_{obs} = .4399 / 1.4100 = .312$ .
$F_{obs} < F_{.95}(26,34) = \frac{1}{F_{.05}(34,26)} = .532$ .
$.312 < .532$ $\alpha$ = 5%.
qf(.05, 34, 26, lower.tail = F) = 1.879.
$P(F<.312) = P(1/F> 3.205) = .0016$ .
pf(3.205,34,26,lower.tail = F) = .0016

More examples on calculating Type I and II error probabilities and power of a statistical test

Learning objective

Know how to compute type I and type II error probabilities for a given test and given a true parameter value
Know how to find a power function of a given test
Know how to find a required sample size to achieve certain statistical power

Recall for a statistical test, there are two types of errors

Type I error: rejecting the null hypothesis when the null hypothesis is true.
Type II error: not rejecting the null hypothesis when the alternative hypothesis is true.

Example

$H_0: p = 1/2$ $H_1: p < 1/2.$ $n = 10$ $X_1,X_2,...,X_{10}$ $X_i = 1$ $X_i = 0$ $4$ $n<30$ $T = \sum_{i=1}^{10} X_i$ .

$5$ %.
$p = 1/5$ .
$p=1/100$ .

	p = 0.5	p = 0.2	p = 0.01
0	0.0010	0.1074	0.9044
1	0.0098	0.2684	0.0914
2	0.0439	0.3020	0.0042
3	0.1172	0.2013	0.0001
4	0.2051	0.0881	0.0000
5	0.2461	0.0264	0.0000
6	0.2051	0.0055	0.0000
7	0.1172	0.0008	0.0000
8	0.0439	0.0001	0.0000
9	0.0098	0.0000	0.0000
10	0.0010	0.0000	0.0000

Solution

$H_0: p = 1/2$ $H_1: p < 1/2.$
$T = \sum_{i=1}^{10} X_i$
$T \sim {\rm Bin}(10,1/2)$ ,
$T_{obs} \leq k$ $k$ $5$ $P_{p=.5}(T\le k)\le .05$ .

$T$ $T \sim {\rm Bin}(10, p)$ .

$k=1$ $P_{p=.5}(T\le 1) = .017 < .05$ $P_{p=.5}(T\le 2) = .0547 >.05$ $T$ , it is not possible to construct a test such that the type I error probability is exactly 5%.

$T_{obs} \leq 1$ .
$p= . 2$ $p=.2$ $P_{p=.2}(T >1)$ .
$p=.2$ $T =\sum_{i=1}^{10} X_i$ ${\rm Bin}(10, .2)$ .
$P_{p=.2}(T >2) = 1 - (.1074+.2684) = .6242$
$p= . 01$ $p=.01$ $P_{p=.01}(T >1)$ .
$P_{p=.01}(T >1)= 1-(.9044+ .0914)= .0043$

Definition:power $\theta_a$ $\theta_a$ . In other words,

{\rm power}(\theta_a; \mbox{Test}) = P_{\theta=\theta_a}(\mbox{Test Statistic }\in \mbox{Rejection Region})

${\rm power}(\theta_a; \mbox{Test})$ ${\rm power}(\theta_a)$ ${\rm power}(\theta_a)$ $K(\theta_a)$ .)

$\theta_a\in \Theta_1$ $\theta_a$ $\theta = \theta_a$ , since

\begin{align} &{\rm power}(\theta_a; \mbox{Test}) \\ &= P_{\theta=\theta_a}(\mbox{Test Statistic }\in \mbox{Rejection Region})\\ &=1-P_{\theta=\theta_a}(\mbox{Test Statistic }\notin \mbox{Rejection Region}) \\ &= 1- P(\mbox{do not reject }H_0 \mbox{ when $H_1$ is true}) \end{align}

For example, in the name tag example above,

$0.5$ $P_{p=.5}(T \le 1) = .0174$ .
$0.2$ $P_{p=.2}(T\le 1) = .3758$ .
$0.01$ $P_{p=.01}(T\le 1) = .9957$ .

In fact, we can regard power as a function of the candidate parameter values.

${\rm power}(p)= P_{p=p}(T\le 1) = \binom{10}{0}p^{0}(1-p)^{10} + \binom{10}{1} p(1-p)^9$ .

Interactive plot for the power function

Example $T$ $p = .1$ . Find the power function of the test.

$p = .5$ ) = 1/20.
$p = .1$ $p=.1$ ) = 19/20.
$p$ ) = 1/20.
$\alpha = .05$ test. However, the "power" of the test is terrible.

Remark 1: $\alpha$ $\alpha$ $\theta \in \Theta_0$ $\theta$ $\theta\in\Theta_1$ $\alpha$ $\theta \in \Theta_1$ (a.k.a., uniformly most powerful test).

Two questions: 1. does UMP test exist? 2. If so, how can we find the UMP test?

$\alpha$ $\alpha$ tests).

Remark 2: Since we are already working with "optimal" tests, for a fixed sample size, both type I and type II error probabilities cannot be made arbitrarily small. For example, given a fixed sample size, we need to increase type I error probability to increase power. The only way of increasing power without increasing the type I error probability is to increase a sample size.

Remark 3: $n$ $\theta_a$ $\theta_0$ $\theta_a$ $\theta_0$ $\theta$ $\theta_0$ $\theta_a$ $H_0$ $H_1$ $\theta_a$ $\theta_0$ , the true value is relatively easy to detect, and the type II error probability is considerably smaller.

Example $X_1,X_2,...,X_n$ $n$ $N(μ, 100)$ $H_0: μ = 60$ $H_1: μ> 60.$ $n = 25$ $T = \frac{\bar{X}-60}{10/\sqrt{25}}$ .

$5$ %.
$\mu = 65$ .
Find the power function of the test.
$1$ $65$ ?

$H_0: μ = 60$ $H_1: μ> 60.$
$\mu = 60$ $T \sim N(0,1)$ $[z_{.05},\infty)$ $P_{\mu = 60}(T \ge z_{0.05}) = .05$ .
$\mu = 65$ $P(T \notin \mbox{Rejection Region}) = P_{\mu = 65}(T <1.645)$ $T$ $\mu = 65$ .
$T = \frac{\bar{X}-60}{2}$ $\bar{X} \sim N(\mu,4)$ $\mu = 65$ $\bar{X} \sim N(65,4)$ .
$T = 0.5\bar{X} -30$ $T\sim N(0.5(65)-30, 0.5^2(4)) = N(2.5, 1)$
$P_{\mu=65}(T<1.645) = P(T-2.5 < 1.645-2.5) = P(Z<-0.855) = .196$
$\mu_a$ $H_0$ $\mu = \mu_a$ $P_{\mu=\mu_a} (T \ge 1.96)$ .
$\mu = \mu_a$ $T = 0.5\bar{X} -30$ $N(0.5\mu_a - 30, 0.5^2(4)) = N(0.5\mu_a-30,1)$ .
$P_{\mu=\mu_a} (T \ge 1.645) = P_{\mu=\mu_a}(T - (0.5\mu_a-30) \ge 1.645-(0.5\mu_a - 30)) = 1-\Phi(1.645-(0.5\mu_a - 30))$ .
$\mu_a$ $1-\Phi(1.645-(0.5\mu_a - 30))$ .

$n$ $T = \frac{\bar{X}-60}{10/\sqrt{n}}$ $n$ such that
$P_{\mu = 65}(T <1.645)< .01.$
$\bar{X} \sim N(65, \frac{100}{n})$ $T$ $\mu = 65$ $T\sim N(\frac{65-60}{10/\sqrt{n}}, 1) = N(\sqrt{n}/2,1)$ .
$P_{\mu = 65}(T <1.645) =P_{\mu = 65}(T -\sqrt{n}/2<1.645-\sqrt{n}/2) = \Phi(1.645-\sqrt{n}/2) <.01$ .
$1.645-\sqrt{n}/2 \le -z_{.01} = -2.326$ .
$2(1.645+2.326)\le \sqrt{n}$ $n \ge 63.07$ .
$n=64$ is required.

Best Rejection (=Critical) Regions and Likelihood Ratio Tests

Learning objectives

$\alpha$ test using Neyman-Pearson (NP) Lemma when both null and alternative hypotheses are simple.
$\alpha$ test when the alternative hypothesis is composite and the best critical region from NP Lemma only depends on the null value.
Know how to find a rejection region based on the likelihood ratio test.

$\Theta$ .

$H: \theta \in \Theta'$ $\Theta' \subseteq \Theta$ .
$H_0 : \theta \in \Theta_0$ $H_1:\theta \in \Theta_1$ $\Theta = \Theta_0 \cup \Theta_1$ and

Definition [Simple and composite hypotheses] $(X_1,\dots,X_n)$ $n$ $X_i \sim F_{\theta}$ .

$H:\theta \in \Theta'$ is said to be a simple hypothesis if the hypothesis uniquely specifies the distribution of the population from which the sample is taken.
Any hypothesis that is not a simple hypothesis is called a composite hypothesis.

Example 1 $(X_1,\dots,X_n)$ $n$ $X_i \sim Exp(\theta)$ .

$H: \theta = 1$ :
$H': \theta \ge 3$ :
$H'':\theta\neq 2$ :
$H_0: \theta = 1$ $H_1: \theta>1$ :

Example 2 $(X_1,\dots,X_n)$ $n$ $X_i \sim N(\mu,\sigma^2)$ $\sigma^2$ unknown.

$H: \mu=1$ :

$H_0: \theta=\theta_0$ $H_1:\theta=\theta_1$ . We would like to choose a test T so that

$T$ $\alpha$ $(\theta_0; T) = P(\mbox{Reject $H_0$ with T}; \theta = \theta_0) \le \alpha$ .
$(\theta_1;T)$ is as large as possible

In other words, we seek a $\alpha$ test.

Remark: if two tests have different significance levels, they are not comparable.

Neyman-Pearson Lemma $H_0$ $H_1$ . Before we present the theorem, let's consider the following example to build an intuition about the best rejection regions.

Example $X$ $\mu$ $1$ $H_0:\mu=0$ $H_1:\mu=5$ .

$f(x;0)/f(x;5)$ $x$ $N(5,1)$ $f(x;5)$ ).

Theorem [Neyman-Pearson Lemma]:

$H_0:\theta=\theta_0$ $H_1:\theta=\theta_1$ $X_1,X_2,...,X_n$ $n$ $f(x;\theta)$ $\theta_0$ $\theta_1$ $\theta$ $L(\theta; x_1,\dots,x_n)$ $L(\theta;x_1,...,x_n)=f(x_1;\theta)f(x_2;\theta)\cdots f(x_n;\theta).$

$\alpha$ $\theta_1$ has the rejection region, RR, determined by

RR = \{(x_1,\dots,x_n); \frac{L(\theta_0;x_1,\dots,x_n)}{L(\theta_1;x_1,\dots,x_n)}\leq k\}.

$k$ $\alpha$ $\alpha$ $H_0$ $H_1$ .

Remark: the rejection region based on NP Lemma can be equivalently defined via log-likelihoods.

$L(\theta_0;x_1,\dots,x_n)/L(\theta_1;x_1,\dots,x_n)\leq k$ $\log L(\theta_0;x_1,\dots,x_n)-\log L(\theta_1;x_1,\dots,x_n)\leq \log k$ , equivalently we can let the rejection region be
$RR = \{(x_1,\dots,x_n); \log L(\theta_0;x_1,\dots,x_n)-\log L(\theta_1;x_1,\dots,x_n) \le \log k\} .$

Example: $X_1,X_2,...,X_n$ $N(\mu,36)$ $\alpha$ $H_0:\mu=50$ $H_1:\mu=55$ .

$H_0$ $\text{log-likelihood difference}\leq k'$ $k'$ later)
and therefore,
$\begin{align} \log L(50; x_1,\dots,x_n)-\log L(55; x_1,\dots,x_n) &= -\frac{1}{72} \left\lbrace \sum_{i=1}^{n}(x_i-50)^2-\sum_{i=1}^{n}(x_i-55)^2\right\rbrace\\ &= -\frac{1}{72} \left\lbrace \sum_{i=1}^{n}(x_i^2-100x_i +50^2)-\sum_{i=1}^{n}(x_i^2-110x_i + 55^2)\right\rbrace\\ &= -\frac{1}{72}(10 \sum_{i=1}^n x_i +50^2 n - 55^2n) \le k' \end{align}$
That is,
$10 \sum_{i=1}^n x_i +50^2 n - 55^2n \ge -72k'$
Equivalently,
$\frac{1}{n}\sum_{i=1}^n x_i \ge (-72k'-50^2 + 55^2n)/(10n)$ ,
$\bar{x}\ge c$ $c = (-72k'-50^2 + 55^2n)/(10n)$ .
$c$ $k'$ $c$ so that
$P_{H_0}(\bar{X}\geq c)=\alpha$ .
$\bar{X} \sim N(50, 36)$ $H_0$ $c=50+z_{\alpha/2}\sqrt{36/n}$ .

Composite hypothesis and $\alpha$ test

$\alpha$ $\theta \in \Theta_1$ . Such test is called a $\alpha$ test.

Remark: $H_1:\theta\neq\theta_0$ )

Simple null and composite alternative hypothesis

$H_0:\mu=50$ $H_1:\mu>50$ $\alpha$ $H_0:\mu=50$ $H_1:\mu=\mu_1$ $\mu_1>50$ any $\mu_1$ , the associated test should be a uniformly most powerful test.

$H_0:\theta=\theta_0$ $H_1: \theta=\theta_1$ $\theta_0$ $\theta_1$ ). In such case, a test obtained by the NP-Lemma is a uniformly most powerful test.

Example $X_1,X_2,...,X_n$ $N(\mu,36)$ $\alpha$ $H_0:\mu=50$ $H_1:\mu>50$ .

Steps:

$H_0:\mu=50$ $H_1:\mu=\mu_1$ $\mu_1>50$
is the same for each $\mu_1>50$ uniformly most powerful test $H_0:\mu=50$ composite $H_1:\mu>50$

$\begin{align} \log L(50; x_1,\dots,x_n)-\log L(55; x_1,\dots,x_n) &= -\frac{1}{72} \left\lbrace \sum_{i=1}^{n}(x_i-50)^2-\sum_{i=1}^{n}(x_i-\mu_1)^2\right\rbrace\\ &= -\frac{1}{72} \left\lbrace \sum_{i=1}^{n}(x_i^2-100x_i +50^2)-\sum_{i=1}^{n}(x_i^2-2\mu_1 x_i + \mu_1^2)\right\rbrace\\ &= -\frac{1}{72}(2(\mu_1-50) \sum_{i=1}^n x_i +50^2 n - \mu_1^2n) \le k' \end{align}$
That is,
$(2-\mu_1) \sum_{i=1}^n x_i +50^2 n - \mu_1^2n \ge -72k'$
Equivalently,
$\frac{1}{n}\sum_{i=1}^n x_i \ge (-72k'-50^2 + \mu_1^2n)/(2-\mu_1)n$ ,
$\bar{x}\ge c$ $c = (-72k'-50^2 + \mu_1^2n)/(2-\mu_1)n$ ,
$c$ $k'$ $c$ so that
$P_{H_0}(\bar{X}\geq c)=\alpha$ .
$\bar{X} \sim N(50, 36)$ $H_0$ $c=50+z_{\alpha/2}\sqrt{36/n}$ .

Likelihood Ratio Tests (LRT)

The previous NP Lemma provides a method of constructing most powerful tests for simple hypotheses (i.e., under assumed hypotheses, the distribution of the observations is known).

$H_0 : \theta = 1$ $H_1 : \theta > 1$ $\theta \in \mathbb{R}$ $H_1:\theta=2$ $1$ in the example).

$H_0:\mu=1$ $H_1:\mu >1$ $(X_1,\dots,X_n)$ $X_i \sim N(\mu, \sigma^2)$ $\mu$ $\sigma^2$ $H_0:\mu=1$ $H_1:\mu =2$ , both hypotheses are still composite, and we cannot use the previous rejection region based on the NP-Lemma.

We now present a very general method-Likelihood Ratio Test (LRT)-that can be used to derive tests of hypotheses.

Likelihood Ratio Test

$H_0: \theta \in \Theta_0$ $H_1:\theta \in \Theta_1$ $\Theta_0$ $\Theta_1$ $\theta$ $\Theta_0\cup \Theta_1 = \Theta$ .

$\lambda(x_1,\dots,x_n)$ by

\lambda(x_1,\dots,x_n) = \frac{L(\hat{\theta}_{0};x_1,\dots,x_n)}{L(\hat{\theta};x_1,\dots,x_n)}= \frac{\max_{\theta\in\Theta_0} L(\theta;x_1,\dots,x_n)}{\max_{\theta\in\Theta}L(\hat{\theta};x_1,\dots,x_n)}.

$\lambda(X_1,\dots,X_n)$ $H_0$ $k$ $k$ should be chosen based on the desired significance level of the test.

Remark 1: the previous method based on the NP-Lemma can be regarded as a special case of LRT.

Remark 2: $\lambda(x_1,\dots,x_n)\le 1$ $(x_1,\dots,x_n)$ .

Example: $(X_1,\dots,X_n)$ $μ$ $σ^2$ $H_0 :μ = μ_0$ $H_1 :μ \ne μ_0$ $\alpha = .05$ .

$T$ $H_0$ $\alpha$ . Unfortunately, the likelihood ratio test does not always produce a test statistic with known null distribution.

$\lambda(X_1,\dots,X_n)$ $\lambda(X_1,\dots,X_n)$ $\Theta_0$ is specified by restriction of the parameter space as

$H_0: \theta \in \Theta_0$ $\leftrightarrow$ $\theta$ take fixed values.

$\theta = (\mu,\sigma^2)$ $H_0: (\mu,\sigma^2) \in \Theta_0$ $\leftrightarrow$ $\theta = (\mu_0,\sigma^2)$ .

Under some mild regularity conditions, we have the following Theorem.

Theorem (asymptotic LRT; Wilks' Theorem) $X_1, X_2,..., X_n$ $L(\theta;x_1,\dots,x_n)$ $r_0$ $H_0 : \theta \in \Theta_0$ $r$ $\theta \in \Theta$ $n$ $T=−2 \log(\lambda(X_1,\dots,X_n))$ $\chi^2$ $r − r_0$ $T=−2 \log(\lambda(X_1,\dots,X_n))$ is called a LRT(Likelihood Ratio Test) Statistic.

The proof of this result uses a Taylor expansion and CLT and is beyond the scope of this class.

Example: $μ$ $σ^2$ $H_0 :μ = 1$ $H_1 :μ \ne 1$ $2.5$ $6$ $n=36$ $\alpha = .05$ .

$\theta= (\mu,\sigma^2)$
$H_0$ if
$\lambda(x_1,\dots,x_n) = \frac{L(\hat{\theta}_{0};x_1,\dots,x_n)}{L(\hat{\theta};x_1,\dots,x_n)}= \frac{\max_{\theta\in\Theta_0} L(\theta;x_1,\dots,x_n)}{\max_{\theta\in\Theta}L(\hat{\theta};x_1,\dots,x_n)} \le k$
Exact test
$k$ $P(\lambda(X_1,\dots,X_n) \le k; \theta \in \Theta_0) \le \alpha$ .
From the above example, we know that
$\lambda(X_1,\dots,X_n) \le k \leftrightarrow T \le k'$ $T \ge k'$ $T = \frac{\bar{X}-\mu_0}{S/\sqrt{n}}$ .
$H_0$ $\frac{\bar{X}-\mu_0}{S/\sqrt{n}} \sim T(n-1)$ $k' = t_{\alpha/2}(n-1) = t_{.025}(35)=2.03$ .
$H_0$ $|T_{obs}| \ge 2.03$ $T_{obs} = \frac{\bar{x}-1}{s/\sqrt{n}} = 1.5 < 2.03$ $H_0.$
Asymptotic LRT Test
$k$ $P(\lambda(X_1,\dots,X_n) \le k; \theta \in \Theta_0)=P(-2\log \lambda(X_1,\dots,X_n) \ge -2\log k; \theta \in \Theta_0) \le \alpha$ .
$-2\log \lambda(X_1,\dots,X_n) \overset{H_0}{\sim} \chi^2_1$ $k$ $-2\log k = \chi^2_{\alpha}(1)$ .
$H_0$ $-2\log \lambda(x_1,\dots,x_n) \ge \chi^2_{.05}(1) = 3.841$ .
$\lambda(x_1,\dots,x_n) = (\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{\sum_{i=1}^n (x_i - \mu_0)^2})^{n/2}$
$\sum_{i=1}^n (x_i - \mu_0)^2 = \sum_{i=1}^n (x_i - \bar{x})^2 + n(\bar{x}-\mu_0)^2 = 35\cdot 36 + 36\cdot (2.5-1)^2 = 1341$ ,
$-2\log\lambda(x_1,\dots,x_n ) = -2(36/2)\log (\frac{35\cdot36}{ 35\cdot 36 + 36\cdot (2.5-1)^2}) = -2 \cdot 18\cdot \log(0.934) = 2.243$
$2.243 < 3.841$ $H_0$ .

Hypothesis Testing

Introduction to Hypothesis Testing

Learning objectives

Hypothesis testing framework

Hypothesis

Test statistic and rejection region

Two types of testing errors

Test statistic and p-value

Steps to perform a hypothesis testing with the significance level \alpha

Duality of confidence intervals with hypothesis tests

Tests about one mean

Learning objective

Summary

Tests about two means

Learning objective

Two independent samples

A paired sample

Summary

Tests about proportions

Learning objective

One sample (large n)

One sample (exact)

Two independent samples (large n_X and n_Y)

Test about variances

Learning objective

Test about variances

More examples on calculating Type I and II error probabilities and power of a statistical test

Learning objective

Best Rejection (=Critical) Regions and Likelihood Ratio Tests

Learning objectives

Composite hypothesis and uniformly most powerful level \alpha test

Simple null and composite alternative hypothesis

Likelihood Ratio Tests (LRT)

$\alpha$

$n_X$ $n_Y$ )

Composite hypothesis and $\alpha$ test